# Fair matroid submodular maximization

This folder contains the codes for paper titled **"Fairness in Streaming Submodular Maximization over a Matroid Constraint"**.

Each experiment requires a submodular function, a matroid, both accessed through oracles, and a fairness constraint.
Oracles are implemented for several submodular functions and matroids; to implement a new submodular
function inherit from `SubmodularFunction` class and to implement a new matroid inherit from `Matroid` class.

## Compiling the code

One can compile all the files with c++17 or later, beside the ones used for preparing the inputs, described below.

## Downloading and preparing datasets

* To run the maximum coverage experiment,
download the files `soc-pokec-relationships.txt` and `soc-pokec-profiles.txt` from [https://snap.stanford.edu/data/soc-Pokec.html](https://snap.stanford.edu/data/soc-Pokec.html) and save them in the same directory containing the preprocessing scripts (`extract-attributes.cpp`, `color-vertices.cpp`, `statistics_bmi.cpp`, `clean_graph_for_bmi.cpp`). Run `extract-attributes.cpp` which should generate `filtered-attributes.txt` file. Next, run `color-vertices.cpp` and `statistics_BMI.cpp` which should generate `color_age_1.txt` and `color-BMI.txt` files, respectively. Finally, run `clean_graph_for_BMI.cpp` which should generate `BMI-soc-pokec-relationships.txt` file. Update the path in `graph.cc` (in `name_to_filename` map) to point to `BMI-soc-pokec-relationships.txt`, `color_age_1.txt` and `color-BMI.txt`.

* To run the exemplar-based clustering experiment, download the data from [https://archive.ics.uci.edu/ml/datasets/Bank+Marketing](https://archive.ics.uci.edu/ml/datasets/Bank+Marketing). Run `bank_input_converter_main.cc` on the downloaded data (`bank`) which converts the input to the format that the oracle function expects. Update the `input_path` in `ClusteringExperiment` function in the main file to point to the converted features.

* To run the movie recommendation experiment, use the `prepare_movies.py` script from <https://github.com/dj3500/movielens-matrix-completion> and follow the instructions there. Place the files `movies.dat` (from the MovieLens-1M archive), `U.txt` and `VT.txt` (generated by the script) in some directory, and update the path to that directory in `movies_data.cc`.


## Running experiments

Refer to the `main.cc` file for examples of how the experiments can be run. Please keep in mind to update the paths for reading the input (as explained above)  and writing the results (in `exp_base_path` in `main.cc`).  The "f" value in the result files correspond to the submodular objective value, "rank" to the rank $k$ of the matroid, and "error" to the violation of the fairness constraint $\mathrm{err}(S)$.

## Example Makefile

One can create a file `Makefile` with the following content.

	SRC_FILES := $(wildcard *.cc)
	H_FILES := $(wildcard *.h)
	CXXFLAGS := -O3 -W -Wall -Wshadow -Wno-unused-parameter -Wno-sign-compare -std=c++17
	BIN := fair-submodular.exe

	$(BIN): $(SRC_FILES) $(H_FILES)
			g++ $(CXXFLAGS) -o $@ $(SRC_FILES)

	all: $(BIN)

Then running `make` will build the code.
